Eliciting Natural Speech From Non-Native Users: Collecting Speech Data For LVCSR

نویسندگان

  • Laura Mayfield Tomokiyo
  • Susanne Burger
چکیده

In this paper, we discuss the design of a database of recorded and transcribed read and spontaneous speech of semiuent, strongly-accented non-native speakers of English. While many speech applications work best with a recognizer that expects native-like usage, others could bene t from a speech recognition component that is forgiving of the sorts of errors that are not a barrier to communication; in order to train such a recognizer a database of non-native speech is needed. We examine how collecting data from non-native speakers must necessarily di er from collection from native speakers, and describe work we did to develop an appropriate scenario, recording setup, and optimal surroundings during recording.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Handling Non-native Speech in LVCSR: A Preliminary Study

In moving towards full incorporation of CSR in applications whose users include non-native speakers, an understanding of how the system can be modified to increase its tolerance to non-native idiosyncrasies such as accented pronunciation and disfluent form is essential. While experiments geared towards restricteduse systems have suggested that extremely simple techniques are effective, prelimin...

متن کامل

Adaptation Methods for Non-native Speech

LVCSR performance is consistently poor on low-pro ciency non-native speech. While gains from speaker adaptation can often bring recognizer performance on highpro ciency non-native speakers close to that seen for native speakers [12], recognition for lower-pro ciency speakers remains low even after individual speaker adaptation [2]. The challenge for accent adaptation is to maximize recognizer p...

متن کامل

Hypothesis-driven accent discrimination

Native and non-native use of language differs, depending on the proficiency of the speaker, in clear and quantifiable ways. It has been shown that customizing the acoustic and language models of a natural language understanding system can significantly improve handling of non-native input; in order to make such a switch, however, the nativeness status of the user must be known. In this paper, w...

متن کامل

Exploring Pragmalinguistic and Sociopragmatic Variability in Speech Act Production of L2 Learners and Native Speakers

The pragmalinguistic and sociopragmatic aspects of language use vary across different situations, languages, and cultures. The separation of these two facets of language use can help to map out the socio-cultural norms and conventions as well as the linguistic forms and strategies that underlie the pragmatic performance of different language speakers in a variety of target language use situatio...

متن کامل

Speech-like Pragmatic Markers in Argumentative Essays Written by Iranian EFL Students and Native English Speaking Students

In this study, the use of speech-like pragmatic markers in Iranian EFL students’ academic writing was investigated. Speech-like pragmatic markers, such as I think, well, I guess, actually, anyway, anyhow, etc. are linguistic components that are more specific to conversation than writing, and writers may wrongly include them in their academic writing. To examine the students’ use of speech-like ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1999